We study the composition style in deep image matting, a notion that characterizes a data generation flow on how to exploit limited foregrounds and random backgrounds to form a training dataset. Prior art executes this flow in a completely random manner by simply going through the foreground pool or by optionally combining two foregrounds before foreground-background composition. In this work, we first show that naive foreground combination can be problematic and therefore derive an alternative formulation to reasonably combine foregrounds. Our second contribution is an observation that matting performance can benefit from a certain occurrence frequency of combined foregrounds and their associated source foregrounds during training. Inspired by this, we introduce a novel composition style that binds the source and combined foregrounds in a definite triplet. In addition, we also find that different orders of foreground combination lead to different foreground patterns, which further inspires a quadruplet-based composition style. Results under controlled experiments on four matting baselines show that our composition styles outperform existing ones and invite consistent performance improvement on both composited and real-world datasets. Code is available at: https://github.com/coconuthust/composition_styles
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
This paper focuses on analyzing and improving the commonsense ability of recent popular vision-language (VL) models. Despite the great success, we observe that existing VL-models still lack commonsense knowledge/reasoning ability (e.g., "Lemons are sour"), which is a vital component towards artificial general intelligence. Through our analysis, we find one important reason is that existing large-scale VL datasets do not contain much commonsense knowledge, which motivates us to improve the commonsense of VL-models from the data perspective. Rather than collecting a new VL training dataset, we propose a more scalable strategy, i.e., "Data Augmentation with kNowledge graph linearization for CommonsensE capability" (DANCE). It can be viewed as one type of data augmentation technique, which can inject commonsense knowledge into existing VL datasets on the fly during training. More specifically, we leverage the commonsense knowledge graph (e.g., ConceptNet) and create variants of text description in VL datasets via bidirectional sub-graph sequentialization. For better commonsense evaluation, we further propose the first retrieval-based commonsense diagnostic benchmark. By conducting extensive experiments on some representative VL-models, we demonstrate that our DANCE technique is able to significantly improve the commonsense ability while maintaining the performance on vanilla retrieval tasks. The code and data are available at https://github.com/pleaseconnectwifi/DANCE
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
我们将点隶属关系引入特征Upsmpling,这一概念描述了每个上采样点的隶属关系到具有语义相似性的本地解码器特征点形成的语义群集。通过重新思考点的隶属关系,我们提出了一种通用公式,用于产生上采样内核。内核不仅鼓励语义平滑度,还鼓励上采样的特征图中的边界清晰度。此类属性对于某些密集的预测任务(例如语义分割)特别有用。我们公式的关键思想是通过比较每个编码器特征点与解码器特征的空间相关局部区域之间的相似性来生成相似性感知的内核。通过这种方式,编码器特征点可以作为提示,以告知UPS采样特征点的语义集群。为了体现该配方,我们进一步实例化了轻巧的增加采样算子,称为相似性 - 吸引点隶属关系(SAPA),并研究其变体。 SAPA会在许多密集的预测任务上邀请一致的性能改进,包括语义分割,对象检测,深度估计和图像垫。代码可用:https://github.com/poppinace/sapa
translated by 谷歌翻译
在合作的多代理增强学习(MARL)中,将价值​​分解与参与者 - 批评结合,使代理人能够学习随机政策,这更适合部分可观察到的环境。鉴于学习能够分散执行的本地政策的目标,通常认为代理人彼此独立,即使在集中式培训中也是如此。但是,这样的假设可能会禁止代理人学习最佳联合政策。为了解决这个问题,我们明确地将代理商之间的依赖性带入集中式培训。尽管这导致了最佳联合政策,但对于分散的执行,可能不会分解它。然而,从理论上讲,从这样的联合政策中,我们始终可以得出另一项联合政策,该政策可实现相同的最优性,但可以分解以分散的执行。为此,我们提出了多机构条件政策分解(MACPF),该政策分解(MACPF)需要进行更集中的培训,但仍可以实现分散的执行。我们在各种合作的MARL任务中验证MACPF,并证明MACPF比基线获得更好的性能或更快的收敛性。
translated by 谷歌翻译
与单案摘要相比,抽象性多文件摘要(MDS)对其冗长和链接的来源的表示和覆盖范围提出了挑战。这项研究开发了一个平行的层次变压器(PHT),具有MDS的注意对齐。通过合并单词和段落级的多头注意,PHT的层次结构可以更好地处理令牌和文档级别的依赖项。为了指导解码到更好的源文档覆盖范围,然后将注意力调整机制引入以校准光束搜索,并预测的最佳注意力分布。根据Wikisum数据,进行了全面的评估,以测试拟议的体系结构对MD的改进。通过更好地处理内部和跨文档的信息,结果胭脂和人类评估都表明,我们的分层模型以相对较低的计算成本生成较高质量的摘要。
translated by 谷歌翻译
由于空间和时间变化的模糊,视频脱毛是一个高度不足的问题。视频脱毛的直观方法包括两个步骤:a)检测当前框架中的模糊区域; b)利用来自相邻帧中清晰区域的信息,以使当前框架脱毛。为了实现这一过程,我们的想法是检测每个帧的像素模糊级别,并将其与视频Deblurring结合使用。为此,我们提出了一个新颖的框架,该框架利用了先验运动级(MMP)作为有效的深视频脱张的指南。具体而言,由于在曝光时间内沿其轨迹的像素运动与运动模糊水平呈正相关,因此我们首先使用高频尖锐框架的光流量的平均幅度来生成合成模糊框架及其相应的像素 - 像素 - 明智的运动幅度地图。然后,我们构建一个数据集,包括模糊框架和MMP对。然后,由紧凑的CNN通过回归来学习MMP。 MMP包括空间和时间模糊级别的信息,可以将其进一步集成到视频脱毛的有效复发性神经网络(RNN)中。我们进行密集的实验,以验证公共数据集中提出的方法的有效性。
translated by 谷歌翻译
为了提高单帧3D对象检测的检测器,我们提出了一种新方法来训练它,以模拟在多帧点云上训练的检测器之后的功能和响应。我们的方法仅在训练单帧检测器时才需要多帧点云,并且一旦受过训练,它就可以在推理过程中仅用单帧点云作为输入来检测对象。我们设计了一个新颖的模拟多帧单阶段对象检测器(SMF-SSD)框架来实现该方法:多视图密集对象融合以使地面真实对象具有生成多帧点云;自我发项体素蒸馏,以促进从多框到单框体素的一到一对知识转移;多尺度的BEV功能蒸馏以在低级空间和高级语义BEV特征中传递知识;和自适应响应蒸馏以激活高置信度和准确定位的单帧反应。 Waymo测试集上的实验结果表明,我们的SMF-SSD始终优于所有最新的单帧3D对象检测器,用于所有难度级别1和2的对象类别的MAP和MAPH。
translated by 谷歌翻译
回归学习是经典的,是医学图像分析的基础。它为许多关键应用程序提供了连续的映射,例如属性估计,对象检测,分割和非刚性注册。但是,先前的研究主要以案例标准(如均方误差)为优化目标。他们忽略了非常重要的人口相关标准,这正是许多任务中的最终评估指标。在这项工作中,我们建议通过有关直接优化细粒相关损失的新型研究来重新审视经典回归任务。我们主要探索两个互补相关索引作为可学习的损失:Pearson线性相关(PLC)和Spearman等级相关性(SRC)。本文的贡献是两个折叠。首先,对于全球层面的PLC,我们提出了一项策略,以使其对异常值进行强大的态度并规范关键分布因素。这些努力显着稳定学习并扩大了PLC的功效。其次,对于本地级别的SRC,我们提出了一种粗到精细的方案,以减轻样品之间确切排名顺序的学习。具体而言,我们将样本排名的学习转换为样本之间相似关系的学习。我们在两个典型的超声图像回归任务上广泛验证了我们的方法,包括图像质量评估和生物措施测量。实验证明,通过直接优化相关性的细粒度指导,回归性能得到显着提高。我们提出的相关性损失是一般的,可以扩展到更重要的应用程序。
translated by 谷歌翻译